Diagnosing Human Judgments in MT Evaluation: an Example based on the Spanish Language
نویسندگان
چکیده
This paper aims at providing a methodology for analyzing the reliability of human evaluation in MT. In the scope of the second TC-STAR evaluation campaign, during which a human evaluation on English-to-Spanish was carried out, we first demonstrate the reliability of the evaluation. Then, we define several methods to detect judges who could bias the evaluation with judgments which are too strict, too permissive or simply incoherent.
منابع مشابه
Fuzzy Multi-criteria decision making approach for human capital evaluation of municipal districts
People in every organization could be considered as the most important resource which contributes to the development of that organization. In fact, human capital is the most important dimension of organization’s intellectual capital especially in service-oriented organizations like municipality. Therefore, the main purpose of this paper is to introduce a suitable framework for human capital eva...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملAutomatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics
Evaluation is recognized as an extremely helpful forcing function in Human Language Technology R&D. Unfortunately, evaluation has not been a very powerful tool in machine translation (MT) research because it requires human judgments and is thus expensive and time-consuming and not easily factored into the MT research agenda. However, at the July 2001 TIDES PI meeting in Philadelphia, IBM descri...
متن کاملAutomatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics
Evaluation is recognized as an extremely helpful forcing function in Human Language Technology R&D. Unfortunately, evaluation has not been a very powerful tool in machine translation (MT) research because it requires human judgments and is thus expensive and time-consuming and not easily factored into the MT research agenda. However, at the July 2001 TIDES PI meeting in Philadelphia, IBM descri...
متن کاملCultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis
This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...
متن کامل